Whether you decide to take this course with your colleagues or on your own, we know you will benefit from its overall objective to cultivate the practice of integrated data analysis. We recommend that teams determine how they can best learn together. Our goal is to provide you with enough guidance to help you make this learning experience best suit your team's needs.
You will be introduced to a variety of learning materials throughout the four modules of this course. Here are some suggestions for how you might use the course material:
The course is organized along these categories: Think, Watch, Do, Best Practices, Deep Dive.
The following guiding symbols will facilitate your navigation and will guide you throughout the course content.
There will be 4 modules:
Imagine that your manager has asked you to lead an analytical project.
A client needs information on the educational attainment of different groups within the Canadian population.
How would you get started? Take a moment to think about it, or brainstorm with your team. Write down 1 or 2 first steps in the box below.
So, how do you get an analytical project started?
Let’s start by watching a first short video: Making an analytical plan (8:13)
This video introduces you to the steps involved in the analytical process and describes the first two steps of the process.
In the video, you learned that the analytical process can be viewed as a series of steps designed to answer a well-defined question. Once the topic has been defined, the next step is to create an analytical plan. And always incorporate the feedback you receive during the planning stage of your analytical project.
The following actions are all part of the first steps of an analytical project. We provide more details for each action in the next pages.
Meet with the client to discuss their needs in more detail.
It’s important to know where you are going and get buy-in from clients before diving into the analysis. During your meeting with the client, you seek to clarify their needs because their original request is broad. Which population groups are they interested in? What is the desired level of detail or disaggregation? Why do they need this information: how will this information help the client in their decision-making?
You learn that the client is particularly interested in information about the educational attainment of women living in urban versus more rural areas. This topic has relevance for this client since they run a program to support the economic development and growth of rural communities. The relevant age group for this analysis would be people in the core working-age population (25 to 64 years old). The client is also interested in characterizing educational outcomes of different groups of women based on racialized group membership.
Search for previously published information. It’s important to understand what is already known on the topic. In your review of the literature, you notice that there is indeed a lack of detailed information on education outcomes for women from different racialized groups by area of residence in Canada. This study would bring significant contribution to the existing body of knowledge on educational outcomes of different groups of population within Canada.
Identify a suitable data source. After careful consideration and with the agreement of the client, you have decided to use the Census of the Population (Long Form) as your data source because of its representativity and large sample size. Information on highest level of educational attainment, population group, gender, area of residence, and more! are all available in the census. As a StatCan employee, you have access to the raw data.
You now have a better grasp of your analytical objective. Can you articulate the overall research question for your project?
Take a moment to think about it, or brainstorm with your team. Write your question in the box below.
It is always important to define the value that your analysis adds either to your organization, your client, or to our understanding of the topic. Why is your question relevant? Why should we care about your work?
Prepare an analytical plan and circulate it for review and approval.
You are now ready to put together a one-pager describing your analytical plan for review and approval (add a template). The analytical plan should contain a short description of the background of the project along with your research questions or objectives, the data source(s) that will be used, the planned methodology, timelines, etc.
It’s a good idea at this point in the process to take the time to create a template table which, when populated with data, will allow you to provide answers to your research question. If you can’t translate your question into a data table, you probably can’t answer your question with the available data! Carrying out your analysis will be easier if you have mock-up tables to fill.
Take a moment to think about how your table would look like. Take a pen and a piece of paper and try to create your (empty) table template.
Your research question is:
What are the educational outcomes of women living in urban versus more rural areas, and are there differences by racialized group membership?
Here is a good table template to address your research question:
This table would allow you to compare the proportion of women with no diploma; a high school diploma; and postsecondary education, for all women; for women from the total racialized population; and also separately for the 6 largest racialized groups in Canada (South Asian, Chinese, Black, Filipino, Arab, and Latin American), separately for those living in urban and those living in rural areas.
You are ready to move onto the next steps of the project!
With your manager and your client, you have identified a relevant research question to address along with an appropriate data source to leverage, and you have received feedback (and approval to go ahead) on your analytical plan.
What would be your next steps? Take a moment to think about it, or brainstorm with your team.
Write down one or two next steps in the box below.
Now that you have learned how to plan an analytical project, we will discuss best practices for preparing and analyzing your data.
To complete the activity, watch the video: Implementing the analytical plan (6:11).
This video will take you through the third and fourth steps of the analytical process.
Taking the time to define your concepts using the appropriate standards is a crucial part of the analytical process. Looking back at your research question, which concepts will you need to define?
Your research question is:
What are the educational outcomes of women living in urban versus more rural areas, and are there differences by racialized group membership?
Take a moment to think about it, or brainstorm with your team. Write down which concepts you will need to define in the box below.
For your project, you need to define your concepts of educational outcome, women, urban vs. rural areas of residence, and racialized group membership. You consult the 2021 Census dictionary for standards and definitions.
Now that you have defined your concepts using census standards, you can finalize your table template to reflect your decisions.
Here is a good table template reflecting your decisions:
Assessing the accuracy and validity of your data is an important part of the analytical process. It is therefore important to double-check your dataset to identify any values that appear invalid or somehow missing, and that could mislead your analyses.
In the following video, we present methods to describe accuracy in terms of validity and correctness. We also discuss methods to validate and check the accuracy of data values.
To complete the activity, watch the video: Data Accuracy and Validation: Methods to ensure the quality of data (10:29)
Taking the time to understand and verify the dataset(s) you will work with is a crucial part of the analytical process.
It can be as simple as double-checking your dataset(s) to identify any values that appear invalid or somehow missing, and that could mislead your analyses.
Take a look at the dataset below. There are some problematic values, which can be classified as:
Can you spot problematic values?
|
Age in years |
Gender |
Visible Minority |
|
|---|---|---|---|
|
Person 1 |
34 |
Man |
Yes |
|
Person 2 |
102 |
— |
Canada |
|
Person 3 |
56 |
Woman |
No |
|
Person 4 |
999 |
Woman |
Yes |
Try to reproduce numbers that have been previously published with your data source or a similar data source.
Even though you did not find any information on your specific topic, you did find previously published information on education outcomes of men and women using the census. You double-check if your proportions of women by highest level of education attained match the ones that have been published.
Now that you have familiarized yourself with your data set, and have cleaned and prepared it for analysis, it seems like you are finally ready to analyze your data.
How would you get started with your analysis?
Analyzing data is the step where data turns into information. This is where it gets interesting: you are finally looking for answers to your analytical questions.
This step should be straightforward if you identified clear questions to address and created table templates for each question.
This is where having a clear analytical plan comes in handy. One by one, you will go through your questions and produce tables that shed light on what you are investigating.
Be purposeful and intentional: remember that you are not on a fishing expedition, and that no single project can provide all the answers.
In your project, you would have one table template to fill with data. More specifically, you must calculate the proportion of women without any diploma; with a high school diploma; and with postsecondary education, by remoteness level, and by membership into racialized groups.
Deep dive. What are proportions? A complete review of actual data analysis techniques is out of scope for this course. However, often the easiest way to analyze data is to simply compare one given number with another. Watch the following video where you will be introduced to the basic concepts of proportions, ratios, and rates.
To complete the activity, watch the video: Proportions, ratios and rates (13:16)
You have calculated the proportion of women without any diploma; with a high school diploma; and with postsecondary education, for women overall and women from racialized groups, by areas of residence in Canada.
Take a look at the table here:
Looking only at the data for women overall and women from racialized groups, what patterns start to emerge?
Take a moment to think about it, or brainstorm with your team. Write down a few patterns that start to emerge in the box below.
Summarize your results as you go: It is a good idea to write down a few bullets besides your table(s) as you start to populate them to summarize the message. For instance, when looking at the data for women overall, we can say that…
Check your results as you go: Don’t forget to check for the quality of the estimates that you are producing. One important aspect to verify is the sample size behind each table cell. This is especially important if you are disaggregating your data at very fine levels of details. Estimates based on small sample sizes will be associated with more variability (or uncertainty), and in some cases, should not even be published.
For your project, you planned to disaggregate your data on educational outcomes for 5 area types (ranging from easily accessible areas to very remote areas), and for 6 racialized groups (South Asian, Chinese, Black, Filipino, Arab, and Latin American). Can your data support this level of disaggregation?
You examine your data and notice that you will run into sample size issues for women living in very remote areas. There are very few South Asian, Chinese, Black, Filipino, Arab, and Latin American women living in the most remote regions of Canada. Based on Census confidentiality rules, you will have to suppress all data points for these women for “No certificate, diploma or degree” and “High school diploma”. The only publishable data point for these women is the category “Postsecondary education”.
Here is your table template:
After much data crunching, you have filled out your table template. You are ready to move onto the next steps of the project!
Now that we've learned how to plan and implement an analytical project, we will discuss best practices for summarizing and sharing your findings.
To complete the activity, watch the video: Sharing your findings (11:38)
This video will take you through the last two steps of the analytical process.
In the video, you have learned about the importance of interpreting your findings using clear and neutral language, and to stay true to your analytical question while telling your data story.
Analyzing data is the step where you turn the data into information. This is where it gets interesting: you are finally looking for answers to your analytical questions. Answers to your questions can be expressed as key messages.
You have now produced your table to provide information on the educational attainment of women from different areas of residence and from different racialized groups within Canada. It seems like you are now ready to summarize your findings.
Let’s practice the art of extracting key messages from tables. Here is your table template:
Looking at your complete data table, what would be the main messages? Please express them in plain and neutral language. Take a moment to think about it, or brainstorm with your team.
One way to get started can be to first focus on the patterns for women overall (highlight first data column). How are the different levels of education distributed across areas of residence for women overall?
One finding that pops out is that the lowest proportion of women with postsecondary education is observed among women living in very remote areas (40%) (highlight in the table), while the highest proportion of women with postsecondary education is observed among women living in easily accessible areas (69%) (highlight in the table).
Let’s turn to patterns for women from racialized groups (highlight second data column). What is popping?Well, interestingly, it looks like the highest proportion of racialized women with postsecondary education is observed among those living in very remote areas (80%) (highlight in the table).
How about women from specific groups (highlight data columns for South Asian, Chinese, Black, Filipino, Latin American, and Arab)? Which groups follow the same pattern? Which groups show a different pattern?
Pause and take a moment to investigate.
Always aim to express your key messages objectively. Let’s take a deep dive into an important guiding principle for analysis: Neutrality
You are now ready to prepare your work for dissemination and communicate your findings!
Presenting your findings clearly to others is one of the most challenging aspects of the analytical process. Let’s discuss in more details the idea of using data to tell a story.
In the next video, we will talk about storytelling and describe the different components of a data story, including the data; the narrative; and the visualizations. We will also discuss how each component can be used to construct concise, informative, and engaging messages your audience will remember.
To complete the activity, watch the video: Telling the data story: How to create stories that matter (12:35)
In the video, you learned that the three most important components of a data story were: the data, the narrative, and the visualizations. We also talked about the importance of planning your data story by first determining who your audience is, what the goal of the story should be, and how it might be best presented.
You have analyzed your data table and carefully selected the key findings that provided an answer to your research questions. The goal of your data story was to describe how educational outcomes for women varied by areas of residence and membership in racialized groups.
You decide to share your findings via two formats: a short research report and a presentation. The goal of the research report is to inform the client that education outcomes for women vary by area of residence and by membership in racialized groups. The report contains all the methodological information that someone with a fair amount of data literacy would need. The presentation, on the other hand, is a more visual and engaging summary to quickly disseminate the storyline. Both formats have met their goal of informing their audience accordingly.
Deep dive. As we saw in the previous video, data visualizations are a very important part of your data story. The next video will give you a better understanding of data visualizations and how they can be used to present data in an engaging and aesthetically pleasing way.
Interested to learn more? Watch the video: Data Visualization: An introduction (10:54)
Can you think of effective, simple graphs that would help make your key messages pop?
Take a moment to think about it, or brainstorm with your team. Feel free to use Excel to create 1 or 2 charts.
Going back to when we were comparing the proportions of women with postsecondary education by areas of residence and racialized group membership, this is an example of a chart that would make the observed patterns pop:
Don’t forget to seek feedback on your work before disseminating your findings!
Remember: your work should go through an extensive review process before being considered “Final”. You can request feedback from a range of people such as colleagues, managers, subject matter experts and data or methodology experts. Ask your reviewers for feedback on different aspects of your work, such as the clarity of your analytical objectives, appropriateness of the data you've used, definition of concepts, review of literature, methodological approach, interpretation of your results, and clarity and neutrality of your writing.
In the next Module, we will put it all together and walk you through a case study to give you an overview of how an analyst has gone through all the steps of the analytical process in their project.
In this video, we will walk you through an example to give you an overview of how an analyst has gone through all the steps of the analytical process in their project.
To complete the activity, watch the video: Analysis 101, part 4: Case study (9:01)
You've reached the end of the Foundations of data analysis: steps of the analytical process course.
What comes next?
Help us improve future courses and let us know what you thought of this one, by completing the course evaluation survey that you will receive by email.